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This paper investigates a possibly fundamental aspect of technological progress. If knowledge ac- 
cumulates as technology advances, then successive generations of innovators may face an increasing 
educational burden. Innovators can compensate through lengthening educational phases and narrowing 
expertise, but these responses come at the cost of reducing individual innovative capacities, with impli- 
cations for the organization of innovative activity—a greater reliance on teamwork—and negative impli- 
cations for growth. Building on this “burden of knowledge” mechanism, this paper first presents six facts 
about innovator behaviour. I show that age at first invention, specialization, and teamwork increase over 
time in a large micro-data set of inventors. Furthermore, in cross-section, specialization and teamwork ap- 
pear greater in deeper areas of knowledge, while, surprisingly, age at first invention shows little variation 
across fields. A model then demonstrates how these facts can emerge in tandem. The theory further de- 
velops explicit implications for economic growth, providing an explanation for why productivity growth 
rates did not accelerate through the 20th century despite an enormous expansion in collective research 
effort. Upward trends in academic collaboration and lengthening doctorates, which have been noted in 
other research, can also be explained in this framework. The knowledge burden mechanism suggests that 
the nature of innovation is changing, with negative implications for long-run economic growth. 


1. INTRODUCTION 


Understanding innovation is central to understanding many important aspects of economics, from 
market structure to aggregate growth. Innovators, in turn, are a necessary input to any innovation. 
The innovator, wrestling with a creative idea, working with colleagues, and bringing an idea to 
fruition, seems the very heart of the innovative process. 

This paper places innovators at the centre of analysis and focuses on two simple obser- 
vations. First, innovators are not born at the frontier of knowledge; rather, they must initially 
undertake significant education. Second, the frontier of knowledge varies across fields and over 
time. This paper presents facts and theory that build on these observations, suggesting possibly 
fundamental consequences for the organization of innovative activity and, in the aggregate, for 
growth. 

The first observation concerns human capital and highlights a general distinction between 
human capital and other stock variables. Physical stocks can be transferred easily, as property 
rights, from one agent to another. Human capital, by contrast, is not transferred easily. The vessel 
of human capital—the individual—is born with little knowledge and absorbs information at a 
limited rate, so that training occupies a significant portion of the life cycle. The difficulty of 
transferring human capital has broad implications in economics;! in this paper, I focus on basic 
implications for innovation. 


1. See, for example Ben-Porath (1967) regarding life cycle earnings and Hart and Moore (1994) regarding debt 
contracts. 
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The second observation concerns the total stock of knowledge. In 1676, Isaac Newton wrote 
famously to Robert Hooke, “If I have seen further it is by standing on ye sholders of Giants". 
Newton’s sentiment suggests that knowledge begets new knowledge, an observation that has 
been formalized in the growth literature (Romer, 1990; Jones, 1995a; Weitzman, 1998), with 
implications discussed extensively both there (e.g. Jones, 19955; Kortum, 1997; Young, 1998) 
and in the micro-innovation literature (e.g. Scotchmer, 1991; Henderson and Cockburn, 1996). 
This paper suggests a different, indirect implication of Newton's observation: if one is to stand on 
the shoulders of giants, one must first climb up their backs, and the greater the body of knowledge, 
the harder this climb becomes. 

If innovation increases the stock of knowledge, then the educational burden on succes- 
sive cohorts of innovators may increase. Innovators might confront this difficulty through two 
basic margins. First, they may choose to learn more. Second, they might compensate by choos- 
ing narrower expertise. Choosing to learn more will leave less time in the life cycle for innovation. 
Narrowing expertise, meanwhile, can reduce individual capabilities and force innovators to work 
in teams. Intriguing evidence along the lines of a “learning more” effect is shown in Table 1, 
which borrows from Jones (2005a) and documents a rising age at great achievement and rising 
doctoral age among Nobel Prize winners over the 20th century. To help motivate the specializa- 
tion margin, and the resulting need for teamwork, consider the invention of the micro-processor. 
As described by Malone (1995), the invention was by necessity the work of a team. The inspira- 
tion began with a researcher named Ted Hoff, who joined in the development with Stan Mazor. 
But as Malone writes, 


Hoff and Mazor didn’t really know how to translate this architecture into a working chip 
design .... In fact, probably only one person in the world did know how to do the next step. 
That was Federico Faggin .... 


The micro-processor was one person's inspiration but several people's invention. It 1s the 
story of researchers with circumscribed abilities, working in a team, and it helps motivate the 
investigations of this paper. 

I begin below by presenting six facts. Using a rich patent data set (Hall, Jaffe and Trajtenberg, 
2001) together with the results of a new data collection exercise to determine the ages of 55,000 
inventors, I develop detailed patent histories for individuals. I show that (1) the age at first inven- 
tion, serving as a proxy measure for educational attainment; (11i) a measure of specialization; and 


TABLE 1 


Age trends among Nobel Prize winners 


Dependent variable 


Age at great achievement Age at highest degree 


Year of great achievement IpI — 

(in 100’s) (1-37) 

Year of highest degree — 4.11*** 
(in 100’s) (0-61) 
Number of observations 544 505 
Time span 1873-1998 1858-1990 
Average age 38.6 26-5 

Re 0-027 0-084 


Notes: (1) This table borrows from Jones (2005a). Age trends are measured in years per century. 
S.E. are given in parentheses. (11) Nobel Prize winners include all winners in Physics, Chemistry, 
Medicine, and Economics. Age at great achievement is age when contribution is made (not later 
age when prize is awarded). 

*** Indicates 9996 confidence level. 
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(111) team size are all increasing over time at substantial rates (Figure 1). These trends are robust to 
a number of controls and in particular are robust across a wide range of technological categories 
and research environments. An informal theory of the “burden of knowledge” might suggest 
these effects. Innovators, when faced with greater knowledge depth, might respond through both 
longer educational periods and greater specialization. 

In cross-section, I develop a measure of “knowledge depth” and show that (iv) teamwork and 
(v) specialization are greater in fields with deeper knowledge. Like the time series results, these 
cross-sectional patterns are robust to numerous controls and, furthermore, seem natural within an 
informal theory of the burden of knowledge. The final fact is then particularly surprising: (vi) the 
average age at first invention is strikingly similar across fields and does not vary with the depth of 
knowledge. This fact suggests a more nuanced mechanism, and the balance of the paper presents 
a model that ties these six facts together. I show how these facts can emerge in tandem, clarifying 
the influence of burden of knowledge on innovator behaviour and building precise implications 
for innovators' aggregate output and thus economic growth. 

In the model, innovators are specialists who interact with each other in the implementation 
of their ideas. The model introduces different areas of application (e.g. airplanes or drugs) within 
which innovators define their specialties. Achieving expertise requires an innovator to bring him- 
self or herself to the frontier of knowledge within some area of application, and the difficulty of 
reaching the frontier—the burden of knowledge—may vary across areas and over time. 

The central choice problem is that of career. At birth, each individual chooses to become 
either a production worker or an innovator. Innovators must further choose specific knowledge 


Trend in age at "first" innovation 
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FIGURE 1 


Basic time trends 
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to learn. This choice is partly one of specialization, with the innovator trading off the costs and 
benefits of broader education: more knowledge leads to increased innovative potential but also 
costs more to acquire. Crucially, however, the career choice is also one of application—what 
broad area of knowledge to enter (e.g. airplanes or drugs). In making this decision, innovators 
are attracted to areas with relatively low knowledge requirements and/or better opportunities, but 
they also seek to avoid crowding. Other things equal, the greater the duplication of innovators 
in a particular area of knowledge, the less expected income each will earn. This decision helps 
pin down innovator behaviour. In particular, arbitraging across different application areas, inno- 
vators allocate themselves to equate expected income across areas of research. Once income has 
been equalized, innovators find equivalent value in education and are only willing to undertake 
the same total education across widely different fields. It then falls to specialization to confront 
variation in the difficulty of reaching the knowledge frontier. Hence, the model predicts equiva- 
lent educational attainment in cross-section but increased specialization and teamwork in deeper 
areas of knowledge, as the facts suggest. 

The time series behaviours and growth implications emerge in the dynamic features of the 
economy. The model marries the burden of knowledge mechanism to two other dimensions— 
population growth and technological opportunity—that are much discussed in the existing growth 
literature. First, a growing population allows the economy to continuously scale up innovative 
effort, keeping growth going even as individual contributions are in decline (as seen in Jones, 
1995a). In this model, population growth also plays a key role by increasing the market size 
for innovations and thus the marginal benefit of education. Second, technological opportuni- 
ties may rise or fall as the economy evolves. This feature captures, in reduced form, a broad 
range of arguments in the literature: both "fishing-out" arguments (e.g. Kortum, 1997) and more 
optimistic specifications where innovation is increasingly easy (e.g. Romer, 1990; Aghion and 
Howitt, 1992). In the model, changing technological opportunities, like population growth, also 
affect the marginal benefit of education. 

In this framework, the same forces that influence innovators' educational decisions also in- 
fluence long-run growth. Indeed, individuals’ educational decisions are made in the context of 
shifting knowledge burden, market size, and technological opportunities, producing detailed pre- 
dictions about innovator behaviour on the one hand and aggregate consequences on the other 
hand. I show that, along a balanced growth path, innovators will seek more education with time, 
with increasing specialization and teamwork driven by a rising burden of knowledge. The model 
can thus explain the time series patterns of innovator behaviour (Figure 1). Moreover, the bal- 
anced growth path is explicitly determined, with the burden of knowledge seen to act on growth 
similarly to the fishing-out effect of more standard models. Therefore, one may view the burden 
of knowledge as a micro-foundation for fishing-out-type effects on growth. Alternatively, if one 
is convinced that a fishing-out process operates independently, then the burden of knowledge is 
seen as an additional effect constraining the growth rate. 

The model can thus serve as a parsimonious explanation for the six facts about the micro- 
behaviour of innovators identified In this paper. As discussed in Section 4, the model can further 
explain several facts documented elsewhere, including upward trends in academic co-authorship 
and doctoral duration. Last, the model provides one consistent explanation for important aggre- 
gate data patterns. First, R&D employment in leading economies has been rising dramatically; 
yet, Total Factor Productivity (TFP) growth has been flat (Jones, 19955). Second, the average 
number of patents produced per R&D worker or R&D dollar has been falling over time across 
countries (Evenson, 1984) and U.S. manufacturing industries (Kortum, 1993). This absence of 
"scale effects" in growth is much debated in the growth literature. It can be understood through 
the model as a burden of knowledge effect, building growth on foundations that also support a 
consistent interpretation for the micro-evidence presented in this paper. 
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This paper is organized as follows. Section 2 presents six central facts about the behaviour 
of innovators. Section 3 presents the burden of knowledge model, which ties these facts together 
and considers the growth implications. Section 4 discusses further empirical applications and 
generalizations of the theory, together with concluding comments. 


2. ECONOMETRIC EVIDENCE 


This section presents a set of facts about the behaviour of innovators. Using an augmented patent 
data set, we will be able to examine three outcomes in particular: 


1. Team size 
2. Age at first innovation 
3. Specialization. 


The data are described in the following subsection. An investigation of basic time trends and 
cross-sectional results follows. 


2.1. Data 


I make extensive use of a patent data set put together by Hall et al. (2001). This data set contains 
every utility patent issued by the U.S. Patent and Trademark Office (USPTO) between 1963 and 
1999. The available information for each patent includes (1) the grant date and application year 
and (11) the technological category. The technological category is provided at various levels of 
abstraction: a 414 main patent class definition used by the USPTO as well as more organized 
36-category and 6-category measures created by Hall et al. (The 36-category and 6-category 
measures are described in Table 5.) For patents granted after 1975, the data set includes addi- 
tionally (411) every patent citation made by each patent and (iv) the names and addresses of the 
inventors listed with each patent. There are 2-9 million patents in the entire data set, with 2-1 
million patents in the 1975—1999 period (Figure 2). 

Using the data available over the 1975—1999 time period, we can define two useful measures 
directly: 


e Team size: The number of inventors listed with each patent. 
e Time lag: The delay between consecutive patent applications from the same inventor. 


For the latter measure, we identify inventors by their last name, first name, and middle 
initial, and then build detailed patent histories for each individual. 
We can also define two more approximate measures that will be useful for analysis: 


e Tree size: The size of the citations “tree” behind any patent. Any given patent will cite a 
number of other patents, which will in turn cite further patents, and so on. For the purposes 
of cross-sectional analysis, the number of nodes in a patent’s backwards-looking patent 
tree serves as a proxy measure for the amount of underlying knowledge. 


1963 1975 2000 
—— i  ——— y 


0-8 million patents 2-1 million patents 
Data includes (1) and (i1) Data includes (1), (11), (ii1) and (iv) 


FIGURE 2 


summary of available data 
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e Field jump: The probability that an innovator switches technological areas between con- 
secutive patent applications. This can serve as a proxy measure for the specialization of 
innovators. The more specialized you are, the less capable you are of switching fields. 


A limitation of this last measure is that since technological categories are assigned to patents 
and not to innovators, inferring an innovator’s specific field of expertise is difficult when inno- 
vators work in teams. For inventors who work in teams, the relation between specialization and 
field jump is in fact ambiguous: as inventors become more specialized and work in larger teams, 
they may jump as regularly as they did before. For the specialization analysis, we therefore focus 
on solo inventors, for whom increased specialization is associated with a decreased capability of 
switching fields. 

Finally, we would like to investigate the age at first innovation, as an outcome-based measure 
that delineates the pre-innovation and active innovation phases. Unfortunately, inventors’ dates of 
birth are not available in the data set, nor from the USPTO generally. However, using name and 
zip code information, it was possible to attain birth date information for a large subset of inventors 
through a public Web site, http://www.AnyBirthday.com. The Web site http://www.AnyBirthday. 
com uses public records and contains birth date information for 135 million Americans. The 
Web site requires a name and zip code to produce a match. Using a Java program to repeatedly 
query the Web site, it was found that of the 224,152 inventors for whom the patent data included 
a zip code, http://www.AnyBirthday.com produced a unique match in 56,281 cases. The age 
data subset and associated selection issues are discussed in detail in Appendix B. The analysis 
given there shows that the age subset is not a random sample of the overall innovator population. 
This caveat should be kept in mind when examining the age results, although it is mitigated by 
the fact that the differences between the groups become small when explained by other observ- 
ables, controlling for these observables in the age regressions has little effect, and the results 
for team size and specialization persist when examining the age subset. See the discussion in 
Appendix B. 


2.2. Time series results 


I consider the evolution over time of our three outcomes of interest. Figure 1 presents the basic 
data, while Tables 2 through 4 examine the time trends in more detail. 

Consider team size first. The lower right panel of Figure 1 shows that team size is increasing 
at a rapid rate, rising from an average of 1-73 in 1975 to 2-33 at the end of the period, for 
a 35% increase overall. Table 2 explores this trend further by performing regressions relating 
team size to application year, and we see that the time trend is robust to a number of controls. 
Controlling for compositional effects shows that any trends into certain technological categories 
or towards patents from abroad have little effect. Repeating the regressions separately for patents 
from domestic versus foreign sources shows that the domestic trend is steeper, though team size 
is rising substantially regardless of source. Repeating the time trend regression individually for 
each of the 36 different technological categories defined by Hall et al. shows that the upward trend 
in team size is positive and highly significant in every single technological category. Running the 
regressions separately by “assignee code” to control for the type of institution that owns the 
patent rights shows that the upward trend also prevails in each of the seven ownership categories 
identified in the data, indicating that the trend is robust across corporate, government, and other 
research settings, both in the U.S. and abroad.” In short, we find an upward trend in team size 
that is both general and steep. 


2. Table B.2 describes the ownership assignment categories. 
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Next, consider the age at first innovation. Note that we define an innovator's "first" in- 
novation as the first time they appear in the data set. Since we cannot witness individuals' patents 
before 1975, this definition is dubious for (1) older individuals and (ii) observations of first inno- 
vations that occur close to 1975. To deal with these two issues, I limit the analysis to those people 
who appear for the first time in the data set between the ages 25 and 35 and after 1985. The upper 
panel of Figure | plots the average age over time, where we see a strong upward trend. The basic 
time trend in Table 3 shows an average increase in age at a rate of 0-66 years per decade. Con- 
trolling for compositional biases due to shifts in technological fields or team size has no effect 
on the estimates. The results are also similar when examining different age windows.’ Analysis 
of trends within technological categories shows that the upward trend in age is quite general. 
Smaller sample sizes tend to reduce significance when the data is finely cut, but an upward age 
trend is found in all six technology classes using six-category measure of Hall et al., and in 29 of 
36 categories when using their 36-category measure. The upward age trend also persists across 
all patent ownership classifications. 

Note that the age at first invention serves as an outcome-based measure to delineate the ed- 
ucation phase and phase in the life cycle. A possible contaminating factor is the duration it takes 
to produce an innovation (the age at first invention is the sum of age at completion of education 
plus the time lag until the first invention). However, in results reported elsewhere (Jones, 20055), 
the time lags between an inventor's inventions are short, do not trend over time, and vary only 
modestly across fields. Thus, the age at first invention appears to track the end of the educa- 
tional phase with little error. Some related evidence regarding doctoral duration is considered in 
Section 4.4 

Now we turn to specialization. The specialization measure considers the probability that an 
innovator switches fields between consecutive innovations. Before examining the raw data, it is 
necessary to consider a truncation problem that may bias us towards finding increased specializa- 
tion over time. The limited window of our observations (1975-1999) means that the maximum 
possible time lag between consecutive patents by an innovator is largest in 1975 and smallest in 
1999. This introduces a downward bias over time in the lag between innovations. It is intuitive, 
and it turns out in the data, that people are more likely to jump fields the longer they go between 
innovations.» Mechanically shorter lags as we move closer to 1999 can therefore produce an 
apparent increase in specialization. To combat this problem, I make use of a conservative and 
transparent strategy. I restrict the analysis to a subset of the data that contains only consecutive 
innovations that were made within the same window of time. In particular, we examine only the 
consecutive innovations when the second application comes within 3 years of the first. Further- 
more, we examine only the innovations that were granted within 3 years of the application.Ó This 
strategy eliminates the bias problem at the cost of limiting our data analysis to the 1975—1993 


3. The table reports results for the 23—33 age window as well. In results not reported, I find that the trend is similar 
across subsets of these windows: ages 23-28, 25-30, 31—35, and so forth. Furthermore, there is no upward trend when 
examining age windows beginning at age 35. 

4. Doctoral age is also an imperfect delineation between education and innovation phases because doctorates 
explicitly require innovative research that begins well before the awarding of the degree. 

5. An interpretation consistent with the spirit of the burden of knowledge concept is that people need time to 
reeducate themselves when they jump fields; hence, a field jump is associated with a larger time lag. 

6. Examining only the patents where the second application came within 3 years limits our analysis to those cases 
where the first application was made before 1997. However, a second issue is that patents are granted with a delay—2 
years on average—and only patents that have been granted appear in the data. For a first patent applied for in 1996, it 
is therefore much more likely that we will witness a second patent applied for in 1997 than one applied for in 1999— 
introducing further downward bias in the data. To deal completely with the truncation problem, we therefore further limit 
ourselves to patents that were granted within 3 years of their application, which means that we only examine the period 
1975-1993. 


(c) 2009 The Review of Economic Studies Limited 


201 


KNOWLEDGE BURDEN MECHANISM 


JONES 


`Z Ə|qu1, ut spuən ou 03 opnirudeui ur 1e[rurs o1e pue opoo oousisse qoeo 10] A[oje1edos uni uoqA JsISJod spuom o8? preAdn ou, 
'[I3op 1ogng ur sopoo oeusisse ou soquosop T'Y AWL ‘Used ay} 0} SIYSLI oy) spyoy oq^ ougop 1e so[qe re^. Aurunp uoA9s oe s[oguoo opoo oousissy (ri) 
'SuonvAJosqo cp seu (s[pouduoq 1o1nduro;) ‘Eg ‘ou L10393) PUIN PILMUMOP JUpOUIUSIS ? Jo oseo ouo aq], 'SoLro391€o 9c osoug ojur poplArp oJe vjep ou) uou 
K[qe1oprsuoo doup səzis o[dureg :spuən pipAdn juvojgrusis Aous soriogojeo c[ *oJog 7|? 72 [[eH Jo uoneogyisse[o KSo[ougoo] ^oxreu ou Sursn uouA Sorrogojeo 
OC ou Jo Gc ur punoj ose o1e spuor pyeAd() 'po[ood ore gp ou uouA se sjuororgooo puor) 1e[ruis YIM ‘SIOZ xis ou] Jo JAY UI JULIU STIS ƏT Spuy 
Əsəu L JV Ja ||€H Jo sesse[o A8o[ouqoo1 peoJq jo ovo ur A[o1e1edos uni uouA sisrs1od puo oSv preAdn ou 'onseour K1089102-9c 1roq ur uoneogisse[o YOLI 10} 
sorumunp opngour s[o:guoo [eorgo[ougoo] MOLEN 7 72 [RH uonvogisse[o [eorgo[ouqoo] poje3o183e jsoui ou] ur soro089jeo xis ou Jo qoeo 10] sorumunp opn[out 
slonuoo [eorSo[ouqoo1 peogq (ID 'cc pue cc SBP UddMIOq SUIT) ISI ou) 10] Teodde ou s10jeAouut osoq 1oprsuoo (/) pue (9) suoneogroedg ‘ÇE pue çz soge 
uddM19q SUN ISI ou 10] 1eodde oq s1ojeAouut SOY) 1oprsuoo (c) ugnomgp (T) suoneogroedg *c96T 193Je 10 ur 39s vjep ou ur oum 1SIU ou 10] Teodde oua 
pUe ewp FL ALY IM WOYM J0J SIOjeAOUUI SOY} IW ATUO ƏA19SqO SUOISSAISAI [[V ‘Sosoyjuosed ut “q's YIM ‘sorenbs 31seo[ Áreurpio o1e suorssoJ8o (1) :$270NI 


810-0 $000 Ic0*0 0c0-0 0c0:0 OIO-0 L00:0 za 
Oz Sei CC E TT Lc [€ ueow ponad jo oj, se puo opeoop Jag 
€'6C €'6C OI 0-I€ OI 0-I€ OI ojqenea juopuodop jo pat 

EEN EE-ET EE CES CE=SC CE-ST çç—ç¿ oguel o3y 

6661-S861 6661-S861 6661-€861 6661—S861 6661-6861 6661-6861 6661-6861 porq 

TOIS TOIS IrS9 IrS9 IrS9 IrS9 IrS9 SUOTIPAIaSqO JO JoquInN 

(90€0:0) (€L7Z0-0) 
8rç€0:0— — 0€90:0— = — = — ozIs Weal, 
SOX — SOX SOX mE — — 9poo JOUSISSV 
SOX — SOX SOX S9A = NE MOJIEN 
— — — — — S9A — peolgd 
S[Oo.quo» p[9g [£9130[ouuoo], 
(6010-0) (LOTO-0) (L600:0) (6600-0) (S600-0) ($600:0) ($600:0) 
r8S0:0 0€$0:0 L890:0 1490-0 1L90-0 9990-0 LS90:0 1eoÁ uongeor[dd y 
(L) (9) (ç) (p) (£) (Z) (D 


uorneor[dde w oS? :o[qe eA juopuodo(T 


uoiaouut Js.if 1D 28D ui spua4[ 


tH IdVL 


(c) 2009 The Review of Economic Studies Limited 


292 REVIEW OF ECONOMIC STUDIES 


period and making our results applicable only to the subsample of “faster” innovators.’ The lower 
left panel of Figure 1 shows the trend from 1975 to 1993. 

Table 4 considers the trend in specialization with and without this corrective strategy. The 
results there, together with the graphical presentation in Figure 1, indicate a smooth decrease in 
the probability of switching fields. The decline is again quite steep. Using the central estimate 
for the trend of —0-003, we can interpret a 6% increase in specialization every 10 years. Note 
that our main results, and Figure 1, use the 414-category measure for technology to determine 
whether a field switch has occurred. This is our most accurate measure of technological field (the 
measures of Hall et al. are aggregations of it), but the results are not influenced by the choice 
of field measure. Note in particular that the percentage trend is robust to the choice of the 6- 
category, 36-category, or 414-category measure for technology—the trend is approximately 6% 
per decade for all three. Including controls for U.S. patents, the application time lag, ownership 
status, and the technological class of the initial patent has little effect. Furthermore, examining 
for trends within each of the 36 categories of Hall et al., we find that the probability of switching 
fields is declining in 34 of the 36; the decline is statistically significant in 20. In sum, we see a 
robust and strongly decreasing tendency for solo innovators to switch fields. 


2.3. Cross-section results 


For a first observation of the data in cross-section, Table 5 presents a simple comparison of 
means across the 6 and 36 technological categories of Hall et al. (2001). The middle column in 
the table presents the mean age at first innovation, and the data show a remarkable consistency 
across technological categories. In 31 of the 36 categories, an innovator’s first innovation tends 
to come at age 29. The lowest mean age among the 36 categories is 28-8, and the highest—an 
outlier that relies on only 12 observations—is 31-1. The table shows that regardless of whether 
the invention comes in “Nuclear & X-rays’, “Furniture, House Fixtures", "Organic Compounds", 
or “Information Storage”, the mean age at first innovation is nearly the same.® 

The next columns of the table consider the average team size. Here, we see large differences 
across technological areas. The largest average team size, 2-91 for the “Drugs” subcategory, is 
over twice that of the smallest, 1-41 for the “Amusement Devices” subcategory. 

Finally, the last columns of the table consider the probability that a solo innovator will 
switch subcategories between innovations. Here, as with team size and unlike the age at first 
innovation, we see large differences across technological areas. This variation is again consistent 
with the predictions of the model. At the same time, this basic, cross-sectional variation in the 
probability of field jump is difficult to interpret: the probability of field jump will be tied to how 
broadly a technological category happens to be defined, which may vary to a large degree across 
categories. 

I can go further using a direct measure of the quantity of knowledge underlying a patent. In 
particular, I can analyse in cross-section what an increase in the knowledge measure implies for 
our outcomes of interest. 


7. These restrictions maintain a significant percentage of the original sample. For example, of the 111,832 people 
who applied successfully for patents in 1975, 81,955 of them received a second patent prior to 2000. Of these 81,955 
people for whom we can witness a time lag between applications, 79-8% made their next application within 3 years. Of 
those, 88-5% were granted both patents within 3 years of application. 

8. These results can also be considered in a regression format. Pooling cross-sections and using application year 
dummies to take care of trends, the results are extremely similar. One can also adjust the time at first innovation by 
subtracting category-specific estimates of the time lag to get a closer estimate of an individual’s education. One can also 
observe different age windows. The result that ages are nearly identical across fields is highly robust. 
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TABLE 5 


Mean differences across technological categories 


Technological classification Age at first Inventors per Probability of 
(Hall et al., 2001) innovation patent field jump 
6 36 Code Obs. Mean Obs. Mean Obs. Mean 
Agriculture, Food, Textiles 11 1 Bilkl 18,351 2-45 2461 0-48 
Coating 12 52 292 32,820 224 4336 0-64 
Gas 13 I7 29-9 10,047 1-96 1692 0-59 
Organic Compounds 14 49 2944 78,188 2:57 8010 0-32 
Chemical (1) Resins 15 43. 29-4 74,993 2.52 7695 0.37 
Miscellaneous—Chemical 19 320 29.3 214,854 2-23 29,721 0-43 
Entire category 502 29.4 429,253 2.35 53,915 043 
Communications 21 270 29.2 98,046 1-99 15,107 0-41 
Computer Hardware & 22 169 29-8 83,094 2-26 10,259 0-44 
Computers & Sowas 
Communications (2) Computer Peripherals 23 as 900. 023800 237 2777. 051 
Information Storage 24 45 29.0 43,182 221 6778 0-40 
Entire category 522 29-4 247,131 2-15 34,916 0-42 
Drugs St 74 29-9 77,210 2-91 7181 0-25 
Surgery & Medical 32 276 29-8 62,192 1-86 12,385 0:29 
Drugs & Medical (3) Instruments 
Biotechnology SE 48 305 29,638 2-75 2223 0-36 
Misc—Drugs & Medical 39 7] 292 14,356 1:66 3488 0-35 
Entire category 469 29-8 183,396 2-43 25,277 0-29 
Electrical Devices 41 116 292 65,500 1-77 12,817 0-48 
Electrical Lighting 42 88 29-6 33,769 1:97 5739 0-43 
Measuring & Testing 43 117 29. 62,021 += 1-94 10,083 0-51 
Electrical & Nuclear & X-rays 44 56 29.5 32,402 2.08 4681 0-50 
Electronic (4) Power Systems 45 124 29.4 73,849 1-94 13,086 0-51 
Semiconductor Devices 46 51 29.3 47,123 2:25 7207 0-34 
Misc—Electrical 49 103 29.1 52,206 1.97 9004 0-51 
Entire category 655 29.3 366,870 1-97 62,617 0-48 
Materials Processing & 51 243 29-4 108,873 1-79 21,821 0-48 
Handling 
Metal Working 52 89 28-8 63,669 2-12 10,454 0-54 
Mechanical (5) Motors, Engines & Parts 53 86 29-4 78,585 1-85 16,221 0-41 
Optics 54 57 29:0 51,102 215 $159 — 0:37 
Transportation 25 279 28.9 61,501 1:66 12,004 0-45 
Misc—Mechanical 59 458 29-1 103,855 1:64 22,513 0.49 
Entire category 1212 29-1 467,585 1-84 91,172 0-46 
Agriculture, Husbandry, Food 61 248 29-1 44,718 1-76 7644 | 0.40 
Amusement Devices 62 267 29.5 22,227] 1-41 4273 0-37 
Apparel & Textile 63 204 29.2 35,001 157 7616 037 
Earth Working & Wells 64 98 29-7 29,645 1:69 6599 0-36 
Others (6) Furniture, House Fixtures 65 332 2941 43,499 1.42 9416 0-50 
Heating 66 55 300 28,267 1:76 6065 0-48 
Pipes & Joints 67 46 29.2 18,444 1-58 4448 0-61 
Receptacles 68 297 29-4 43,353 1-51 10,105 0-47 
Misc— Others 69 848 29.2 179,925 1-74 35,342 0-48 
Entire category 2415 29.3 445,079 1-65 91,508 0-46 


Notes: (1) Age at first innovation includes observations of those innovators who appear after 1985 in the data set and 
between the ages of 23 and 33. Results are similar, with higher mean and even less variance, for 25- to 35-year-olds. 
(11) Probability of field jump is probability of switching categories for solo innovators using 36-category measure. Obs., 
observation. 
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For a continuous measure of the quantity of knowledge, I use the logarithm of the number 
of nodes (i.e. patents) in the citation tree behind any patent.’ As before, there is a truncation issue 
that needs to be considered: the data set does not contain citation information for patents issued 
before 1975, so we tend to see the recent part of the tree. The measure of underlying knowledge 
is then noisier the closer we are to 1975, and I therefore focus on cross-sections later in the 
time period. A second issue is that the average tree size and its variance grow extremely rapidly 
in the time window, which makes it difficult to compare data across cross-sections without a 
normalized measure. Two obvious normalizations are (1) a dummy for whether the tree size is 
greater than the within-period median, and (2) the difference from the within-period mean tree 
size, normalized by the within-period standard deviation. Results are reported using the latter 
definition, as it is informationally richer, though either method shows similar results. 

Table 6 examines the relationship between team size and tree size in pooled cross-sections, 
with and without various controls. I add a quadratic term for the variation in team size to help 
capture evident curvature, and we see that team size rises at an increasing rate as the measure 
of knowledge depth increases. For innovations with larger citation trees, the rise in team size 
is particularly strong. With very deep knowledge trees, an increase of 1 S.D. in the tree size is 
associated with an average increase in team size of one person. The table shows that the cross- 
sectional relationship holds for domestic-source and foreign-source patents and when controlling 
for technological category, so that the variation appears both within fields and across them. Tech- 
nological controls are perhaps best left out, however, since the variations in mean tree size across 
technological category may be equally of interest. Finally, we might be concerned that bigger 
teams simply have a greater propensity to cite, which results in larger trees. This concern proves 
unwarranted. Controlling for the variation in the direct citations made by each patent, we find 
that relationship actually strengthens. In fact, we see that bigger teams tend to cite /ess. This 
result gives us greater faith in the causative arrow implied by the regressions. 

Next, we turn to the age at first innovation. Table 7 examines, in pooled cross-sections, the 
relationship between age and knowledge for those individuals for whom we can be confident that 
they are innovating for the first time (see discussion above). The general conclusion from the 
table is that we must work hard to find a relationship, and at its largest, it is very small. It is not 
robust to the specific age window, is reduced when controlling for the technological category, 
and disappears when controlling for the number of direct citations made. Taking a coefficient of 
0-1 as the maximum estimate from the table, we find that an increase of 1 S.D. in the knowledge 
measure leads to a 0-1-year increase in age. This coefficient may be attenuated given that our 
proxy measure of knowledge is noisy, but I conclude that there is at most only a weak relationship 
between the amount of knowledge underlying a patent and the average age at first innovation. 

Finally, Table 8 considers the relationship between the probability of field jump and the 
knowledge measure. The table shows a robust negative relationship: solo innovators are less 
likely to jump fields when their initial patent has a larger node count. If we identify a larger node 
count with a deeper area of knowledge, then this negative correlation is consistent with the idea 
that deeper areas of knowledge see more specialization. The results are robust to the inclusion 
of many controls, including controls for technological field, foreign or domestic source of the 
patent, and the time lag between the two patents. The results are also strengthened when examining 


9. The distribution of the raw node count within cross-section is highly skewed—the mean is far above the median, 
so that upper tail outliers can dominate the analysis. I therefore use the natural log of the node count, which serves 
to contain the upper tail. A (loose) theoretical justification is knowledge depreciation: distant layers of the tree are 
less relevant to a patent than nearer layers, so there is a natural diminishing impact as nodes grow more distant. The 
diminishing impact of the large, distant layers, which dominate the node counts, is captured loosely by taking logs. 
Noting that the basic results are similar when we use the median-based measure of knowledge depth (a dummy for 
whether the raw node count is above or below the median, which is independent of any monotonic transform of the node 
count) we can be reasonably comfortable with the log measure. 
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TABLE 8 


Field jump versus tree size 


Dependent variable: probability of switching technological field 


(1) (2) (3) (4) (5) (6) 
Normalized variation in tree size —0-0072 —0-0074 —0-0059 —0-0095 —0-0144 —0-0184 
(0-0008) (0-0008) (0-0008) (0-0009) (0-0012) (0-0017) 
Foreign patent — —0-0125 —0-0108 —0-0129 —0-0135 0-0032 
(0-0018) (0-0018) (0-0018) (0-0023) (0-0032) 
Time between applications — — 0-0226 0-0232 0-0215 0-0143 
(0-0004) (0-0004) (0-0012) (0-0017) 
Technological field controls — — — Yes Yes Yes 
(first patent) 
Application year dummies Yes Yes Yes Yes Yes Yes 
Number of observations 353,762 353,762 353,762 353,762 212,274 110,511 
Period 1975-1999 1975-1999 1975-1999 1975-1999 1975-1993 1985-1993 
Mean of dependent variable 0-551 0-551 0-551 0-551 0-536 0-520 
Pseudo- R? 0-0039 0-0039 0-0117 0-0251 0-0171 0-0159 


Notes: (i) Results are for probit estimation, with coefficients reported at mean values and S.E. in parentheses. The 
coefficient for the foreign dummy is reported over the 0-1 range. Only solo inventors are considered. Specifications 
(1) through (4) consider the entire set of solo inventors. Specification (5) considers only those solo inventors who meet 
the criteria in Specifications (1) through (6) in Table 4 (to help control for any truncation bias in the specialization 
measure—see the discussion of Table 4 in the text). Specification (6) considers the same data as Specification (5) but 
only examines cross-sections in the later part of the time period. (11) The dependent variable is 0 if an inventor does 
not switch fields between two consecutive innovations. The field is defined using the 414-category technological class 
definition of the U.S. Patent and Trademark Office. (i111) Normalized variation in tree size is the deviation from the year 
mean tree size, divided by the year S.E. in tree size. Tree size is the log of the number of nodes in the citations tree 
behind any patent. (iv) Technological field controls include dummies for each of 36-category measure of Hall et al. 


cross-sections later in the time period, where the citations trees capture more historical informa- 
tion and may be less noisy measures of the underlying knowledge. 

To summarize, we have presented six facts about innovators. Using the measures defined 
previously, we find that specialization and teamwork appear to increase with time and are also 
greater, in cross-section, in deeper areas of knowledge. Meanwhile, the average age at first inno- 
vation is increasing with time, like specialization and teamwork, but shows little variation with 
the depth of knowledge in cross-section. The following section presents a model, building on 
the burden of knowledge idea, which (a) shows how these behaviours can all emerge in equilib- 
rium and (b) clarifies the growth implications. Further, related evidence from existing literature 
is discussed in Section 4. 


3. THE MODEL 


The model considers innovator behaviour along a balanced growth path. Building on foundations 
of existing growth models, I analyse a structure with two sectors: a production sector where 
competitive firms produce a homogenous output good and an innovation sector where inno- 
vators produce productivity-enhancing ideas. The novelty of the model lies in the innovators’ 
choice problem. Innovators undertake costly human capital investments to bring themselves to 
the knowledge frontier. Innovators weigh the costs and benefits of gaining particular forms of 
expertise, decisions that will be balanced differently by different cohorts as the economy evolves 
and balanced differently in different areas of application. The model ties together the facts of 
Section 2 on the basis of these educational decisions and shows how the burden of knowledge 
interacts with other forces in determining the steady-state growth rate. 
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Section 3.1 describes the production sector and Section 3.2 defines individuals’ life cycles 
and preferences. Sections 3.3 and 3.4 focus on innovators. The first describes the knowledge 
space and the cost of education. The second considers the innovation process and the value of 
ideas. Section 3.5 defines individuals’ equilibrium choices, and Section 3.6 analyses educational 
decisions and growth along a balanced growth path. Proofs are presented in Appendix A. 


3.1. The production sector 


There is a continuum of productive ideas of measure N(t) > 0 at each time t e R. Let each idea 
k € [0, N (t)] make a productivity contribution denoted y (k) > 0. Define X (t) = SC y (k)dk as 
the collective productivity contribution of all existing ideas at time f. 

Let there be a homogenous output good produced by competitive firms at each time t. The 
price of the good is normalized to 1 at each point in time. A firm hires an amount of labour, L(t), 
producing output y(r) — X (t) (t) if all existing ideas k € [0, N (r)] are employed by the firm.!? 

Firms pay workers a competitive wage, w(t), and also make royalty payments. The royalty 
payment per production worker is r(k,t) for idea k at time t, and the total royalty payments per 
production worker are r(t) = ~ r(k,t)dk when all existing ideas k € [0, N(t)] are employed 
by the firm. Profits are z(t) = (X(t) —r(t) — w(t))/(t) when all existing ideas are employed. 
Ideas receive patent protection for a finite number of years z > 0. It is straightforward to show 
that the monopolist owner of an unexpired patent can charge a royalty per worker r(k,t) = y (k), 
and competitive firms will be just willing to pay this fee.!! Meanwhile, r(k,t) = 0 for expired 
patents, which are freely available. Hence, firms employ all available ideas, paying royalties on 
all unexpired patents totalling r(t) = X (t) — X(t — z) per production worker. Total output in the 


economy is: 
Y(t)=X(@)Ly(@), (1) 


where Ly (t) 1s the total mass of production workers. Competitive firms earn zero profits, so that 
w(t) = X(t) —r(t), and the wage paid to a production worker is therefore: 


w(t) = X(t — z). (2) 


3.2. Workers and preferences 


There is a continuum of workers of measure L(t) > O in the economy at time t. This population 
grows at rate gz > 0. Individuals have a common hazard rate ¢ of death. Individuals are risk- 
neutral, with expected utility for an individual i defined by: 


CO 


U'(t) = J cl (v, t)e 9? qr, (3) 


T 


where c' (7, t) is the consumption at time f of an individual i born at time r. 


10. Firms use the whole set of existing ideas rather than just the latest idea. That is, we use a “horizontal” model of 
innovation, where ideas accumulate rather than become obsolete (see, e.g. Barro and Sala-i-Martin, 1995, for a review). 

11. This production set-up follows closely on Arrow (1962) and Nordhaus (1969). The maximization problem of 
a firm can be written explicitly as follows. Define X(t) = SR y (K)1 (k, t)dk and r(t) = nio r (k,t)I(k,t)dk where 
I (k,t) — 1 if the firm employs idea Kk at time f and 7 (k,1) — O0 otherwise. The firm's profit is Z(t) = (X(t) —7(t) — 
w(t))(t) —( KC [Ly (k) ^ r(k, t) H (k, t)dk — w(t))I (t). To maximize profits, the firm chooses which ideas k to employ, 
setting /(k,t) — 1 when y (k) » r(k,t) and I (k,t) — 0 otherwise. The holder of a patent sets r(k,t) = y (k) when the 
patent is valid, while r (k, t) = 0 when the patent has expired. The firm thus sets /(k,t) = 1 for all k € [0, N(¢)]. Firms 
produce with productivity X(t) = X(t), paying royalties on all unexpired patents totalling r(t) = X(t) — X(t — z) per 
worker. 
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I assume that individuals are born without assets and supply a unit of labour inelastically 
at all points over their lifetime. Following standard models of finite horizons (e.g. Blanchard, 
1985), we allow for competitive life insurance and annuity firms so that loans are secured by life 
insurance and assets held as annuities; thus, workers do not die in debt or with positive assets. 
In the absence of physical capital, the discount rate in this model is simply ¢, the hazard rate of 
death.!* From the standard intertemporal budget constraint, the individual’s utility is equivalent 
to the present value of his or her expected lifetime non-interest income. 

The individual maximizes lifetime income through the choice of career. This is a permanent 
decision made at birth. In particular, the individual may become (i) a wage worker or (1i) an 
innovator. Wage workers require no education, and the expected present value of their lifetime 
income is the discounted flow of the wage payments, w(t), they receive. For a wage worker born 


at time 7, 
CO 


pe J w(t)e 9?) qr. (4) 
T 

If an individual chooses to be an innovator, then he or she must further choose a specific 
field of expertise, represented by a vector œ. The innovator pays an immediate educational cost 
at birth, E (ol. T), to bring himself or herself to the frontier of knowledge in the chosen field. He 
or she earns an expected flow of income, v(c/, t), throughout their life that comes from royalties 
on any innovations he or she produces. The expected present value of lifetime income for an 

innovator born at time T is: 


CO 
URSD (lwt, t) = J v (o, t)e * dt — E@’, 1). (5) 


T 


The structure of the innovator’s educational decision, c, and the functional forms of E (o! , T) 
and v (o, t) are defined in the following subsections. 


3.3. Knowledge and education 


Let knowledge be organized as follows. First, there are “areas of application’. Second, there is 
“foundational knowledge” underneath an area of application. For example, one application area 
could be airplanes, building on foundational knowledge of fluid mechanics, thermodynamics, and 
material science. Another application area could be drugs, building on knowledge of immunol- 
ogy, protein synthesis, and bioinformatics. The amount of foundational knowledge may differ for 
different areas of application.!° 

Formally, let there be J areas of application, indexed j € {1,2,..., J}, where J is finite. 
Within each area of application, there is a set of types of foundational knowledge arranged around 
a circle of unit circumference. We denote each such circle A; and a point on such a circle as 5;. 
The measure of foundational knowledge in an application area is denoted D;(f), as shown in 
Figure 3. As helpful nomenclature, we refer to A; as a “circle of knowledge” and the measure 
D (t) the “depth of knowledge". 


12. The riskless rate of return is zero in the absence of physical capital; the discount rate exists purely to cover 
the possibility of death. In particular, the insurance premium to secure loans and the rate of return on annuities are both 
equivalent to the hazard rate of death under the zero-profit condition for insurance and annuity firms. For example, an 
annuity firm pays a stream v while you live in exchange for a dollar invested today. Expected profits for the annuity firm 
are 1 — v/d. The zero-profit condition then requires v = œ. 

13. For simplicity, we assume that all areas of application are used in the production of the homogenous output 
good. One could alternatively allow for multiple types of output goods based on different areas of application, but such 
an extension would distract from the core mechanism of the model and is thus left aside. 
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FIGURE 3 


A circle of knowledge 


A prospective innovator chooses an area of expertise, a vector œ = (j, sj, bj), which defines 
(1) an application area j € {1,2,..., J}; (2) a point, s; € Aj, in the set of foundational knowledge 
types underlying application area j; and (3) a certain distance, b; € [0,00), measured clockwise 
of s; .1^ To ease notation, we have dropped the superscript i in the vector c, and ask the reader to 
recall that c is a choice made by an individual i. 

For an innovator born at time c, the amount of knowledge the innovator acquires is the 
individual’s chosen breadth of expertise, b;, multiplied by the prevailing depth of knowledge, 
Dj (1). The educational cost of acquiring this information is: 


E(o0,1) — (bj Dj(1)), (6) 


where & » 0, which says that learning more requires a greater amount of education. 
The depth of knowledge may differ across different application areas and may evolve with 
time. In particular, 


D;(t) 2 D;X (t? (7) 


where D j > O is specific to the application area. Thus, the amount of foundational knowledge 
for airplanes may be relatively large (high D j) compared to amusement devices (low D j). With 
ô > 0, the depth of knowledge increases over time as the productivity level of the economy 
advances. 


3.4. Innovation 


Once educated, innovators begin to receive innovative ideas. The total stock of ideas at time f 
is N(t). Let each idea (i.e. each unit mass of ideas) add to productivity by an amount y, so that 
productivity evolves as X(t) = y N(t) when the total stock of ideas is employed.!° Recalling that 


14. We allow b; to take values greater than 1—that is for an innovator’s expertise to wrap around the circle multiple 
times. One can imagine that innovators gain further educational value by covering the same foundational knowledge 
again; for example, re-reading material creates better understanding than one’s first read. This assumption is largely 
made for technical reasons, however, to avoid dealing with corner solutions where choices of b; are capped at some 
finite maximum value. Corner solutions can be handled in a variation of the model but are awkward and add no important 
insights. 

15. One could alternatively allow the size of ideas to grow or decline with time, or allow the size of ideas to be 
functions of educational choices. Such specifications would have no substantive effect on the analysis. See Jones (20055) 
and the discussion in Section 4. 
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an idea can be licensed to Ly(t) workers and patents last for z years, the lump sum value of an 
idea is: 
[Tz 
vun Tut (8) 


í 


Note that patents do not expire on the death of an innovator.!© Like any asset, the innovator 
prefers to hold a patent as an annuity, selling a patent to a competitive annuity firm in exchange 
for an annuity #V (ft). For the innovator, the present value of this annuity at time ft is V (1).! 


3.4.1. Inspiration. Ideas comes to an innovator at rate A(@, t). This arrival rate depends 
in part on an innovator’s educational decision, @, which is fixed over an innovator’s lifetime, and 
in part on the overall state of the economy, which evolves with time. In particular, 


A(o, t) = Aj;X(t)*L j(t,8;)~° DF, (9) 


where L ;(t, s;) is the mass of living innovators at time z who have chosen location s;, while A j 
represents application-area-specific research opportunities. 

This reduced-form specification captures several key ideas. The parameter y represents the 
impact of the current state of technology on an innovator’s creative output. It incorporates the 
standard ideas in the literature alluded to in the introduction: fishing-out hypotheses whereby 
innovators’ productivity falls as the state of knowledge advances (y < 0), and rising technolog- 
ical opportunity whereby an improving state of knowledge makes innovators more productive 
(y > 0). The term A j > 0 meanwhile allows for technological opportunities to vary across appli- 
cation areas—for some areas to be relatively “hot” or “cold”. 

The parameter o represents the impact of crowding on the frequency of an innovator’s ideas. 
I assume 0 <o < 1, following standard arguments where innovators partly duplicate each other's 
work. A greater density of workers in the same specialty increases duplication, reducing the rate 
at which a specific individual produces a novel idea.!® 

The final parameter, p, represents the impact of the breadth of expertise. We assume fp > 0, 
which says simply that broader foundational knowledge increases one’s productivity. This is nat- 
ural if, for example, access to a broader set of available knowledge—facts, theories, methods— 
creates better combinatorial possibilities for one’s creativity, along the lines of Weitzman (1998), 
making the innovator more productive. !” 


3.4.2. Implementation. Ideas are implemented by pooling requisite foundational knowl- 
edge. Implementation thus involves the formation of teams. This process operates under simpli- 
fying assumptions as follows. 


16. This is a realistic feature of the model: in the real world, patent rights are assignable and patents do not expire 
on the death of an inventor. 

17. If the innovator did not have access to a competitive financial market that pays the innovator the lump sum 
value of the patent (or an equivalent annuity) in exchange for the patent rights, then the value of the patent to the 
innovator would need to reflect the possibility that the innovator dies before the patent rights expire, in which case 
V() = > bag Ly (1)e-9 6—0 qz. This variation will have no impact on the main results of the model. 

18. An alternative formulation of equation (9), where individuals crowd over an interval of knowledge rather than a 
point of knowledge, can explain the six stylized facts of Section 2 along the same lines as this model but is less tractable. 

19. There are many other mechanisms through which broader expertise would enhance an innovator’s productivity. 
For example, a more broadly expert innovator may better evaluate the expected impact and feasibility of his or her 
ideas. He or she will better select towards high value, successful lines of inquiry, and therefore achieve greater returns. 
Furthermore, if assembling teams is costly, innovators will be unwilling to form large teams. More broadly, expert 
innovators can rely less on large teams for the implementation of their ideas, making their ideas less costly to implement. 
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First, it is assumed that implementation requires all types of foundational knowledge in the 
given area of application. Formally, define ideas in application area j as implementable if for 
each s; € Aj, there exists an individual / in application area j such that s; lies in the arc between 
e, and s + b. Let /;(t) be an indicator function equal to 1 if the idea is implementable and 0 
otherwise. 

Second, we assume that the innovator with the idea claims its rents. One may imagine that 
innovators work in firms, pooling knowledge in application area j, where the wage paid to each 
innovative worker as the value of his or her ideas. Alternatively, one may think of the innovator 
with an idea as a monopolist vis-a-vis potential teammates so that the inspired innovator extracts 
all profits from the project. We abstract from costs in team formation or operation, so that all 
ideas are profitable with lump sum value V (t). 

The expected flow of income to an innovator, v (œ, t), is then the probability an idea arrives 
and is implementable times the lump sum value of the idea.?? Hence, 


v (0, t) — A(0,t)1;(t)V (t). (10) 


Finally, we consider below the size of teams. To simplify this analysis, we assume that 
an inspired innovator forms teams within his or her own cohort if possible and assembles the 
minimum number of people necessary to implement the idea. 


3.5. Equilibrium career choices 


In equilibrium, a player cannot make a different choice of career and be better off. For an indi- 
vidual born at any time c, the decision to become a wage worker requires that 


UYS (T) > UR&D (gy. T) Vo’, 


so that wage workers would not strictly prefer to be R&D workers. Similarly, for an individual 
born at any time ç, the decision to become an R&D worker with educational choice œ requires 
that 


(SD (o. T) > URD ig. T) Vo’, 


URED (w, T) > (ras (z). 


so that R&D workers of type œ would not strictly prefer to be R&D workers of a different type 
or wage workers. 


3.5.1. Balanced growth path. We focus on equilibrium career decisions along a balanced 
growth path. A balanced growth path is defined such that the growth rate in productivity, g = 
X (t)/ X (t), is constant over time and the labour allocations Ly (t) and L;(t,s) Vj,s grow with 


20. One can also consider rent sharing among teammates, which adds considerable complexity. With rent-sharing, 
equilibrium income flow will still take the form of equation (10). This follows because innovators in the same cohort earn 
the same income in equilibrium, which must then be the per capita rate of idea arrival times the value of ideas. At the same 
time, rent sharing can create an inefficiency should innovators expand their expertise not only to improve their creative 
output but also to claim greater royalty shares from their teammates. While rent sharing can thus affect the benefits of 
breadth, the basic idea that the burden of knowledge raises the cost of breadth, provoking increased specialization and 
teamwork, will be robust under a wide variety of rent-sharing arrangements. One might also consider many other possible 
frictions and inefficiencies in team formation. The model featured imagines that such frictions and incentive issues are 
solved, allowing us to focus on a benchmark outcome. Also, see Jones (2008) for a model that features the intersection 
between educational decisions and frictions in team formation. 
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time at the population growth rate gz. The existence of a balanced growth path is established in 
the analysis below. 

We analyse the balanced growth path under three parametric restrictions that are assumed 
throughout the following analysis. 


Assumption 1. J <e. 
This assumption is necessary for an innovator's optimal breadth of expertise, the choice 5;, 
to be an interior maximum. 


Assumption 2. y — p (ô— 1) = |. 
This assumption is necessary for the existence of a constant productivity growth rate g. 


l—o EI 
Ir 
This assumption is necessary for an individual's lifetime income to be finite?! 


Assumption 3. 4 - max(g,[1-4- f (6 — DJe}, where g = 


3.6. Analysis 


Production workers receive a competitive wage w(t) = X (t — z) as shown in equation (2). Along 

a balanced growth path, X (t — z) grows at rate g so that, from equation (4), a production worker 

earns lifetime income: 

X (T — z) 
pma 

22 


U" (z) = (11) 
where we require ¢ > g for finite lifetime income. 

The innovator, meanwhile, makes an educational choice to maximize lifetime income. With 
the objective function (5) and the definitions in equations (6), (9), and (10), the innovator’s 
problem is: 


OO 


max ] Aixey UE ET (12) 


o=(j,5;,D; 


The first result regarding career choice establishes the useful property of income equivalence 
between innovators and wage workers. 


Lemma 1. Along a balanced growth path UPP (œ, t) = UY? (t) for any equilibrium 
choice @ and any cohort T. 


This income equivalence result rules out corner solutions where all individuals choose to be 
wage workers or all choose to be innovators. It follows naturally in the set-up of the model. Wage 
workers are needed to create a market for innovations, and innovators are too productive when 
rare to fail to exist. Along a balanced growth path, masses of wage workers and innovators are 
all growing, so that individuals actively choose both broad types of careers in every cohort, and 
hence, their income must be equivalent in equilibrium. 

The next results further define innovator behaviour, building on the choice of @ = (j, 5;,D;). 


21. The death rate ¢ is the discount rate. One could add a pure rate of time preference to the model, in addition to 
the death rate ó, which would raise the discount rate and allow lifetime income to be bounded under higher growth rates. 

22. Itis demonstrated in the proof of Proposition 2 that finite income for wage workers follows from Assumption 3 
along a balanced growth path. 


(© 2009 The Review of Economic Studies Limited 


JONES KNOWLEDGE BURDEN MECHANISM 305 


Proposition 1. Along a balanced growth path: 
(i) I;(t) — 1 for all j,t. 
(ii) Lj(t,s’) = Lj(t,s”) for all s', s" in an area of application j. 
(ii) E(o, :)/ UR&P (og, +) = where D « e. 





e— p" 


Result (1) says that innovators exist with sufficient expertise to implement any idea in any ap- 
plication area. This follows because duplication is costly so that innovators seek to avoid crowd- 
ing. In particular, with o > 0, any area of application with no active innovators becomes too 
tempting to ignore—an innovator would always deviate to such an area. Result (11) follows from 
similar reasoning. It says that innovators spread evenly within a given application area. This fol- 
lows because, within an application area, there are no costs or benefits of a particular location 
s, except the relative density of innovators there. Hence, with o > O, innovators avoid crowding 
and array themselves evenly. For clarity, we denote the labour allocation Lj;(t,s) as Lu in 
equilibrium to emphasize that it is independent of s in a given area of application j. The total 
mass of innovators at time 1 is then L (f) — > 1 L; (t), and the total mass of wage workers is 
Ly(t) — L(t) — Ls (t). 

Result (iii) is less obvious and more powerful. It says that the ratio between educational 
expenditure and lifetime income is constant, regardless of the innovator cohort or particular equi- 
librium area of expertise. This follows from the choice of b;, which equates the marginal cost and 
benefit of breadth. Generally, if we view D;(c) as the "price" of breadth, then an increased price 
results in decreased breadth, offsetting the rise in total educational cost. In this model, price and 
quantity are traded off exactly so that educational cost is a constant fraction of lifetime income. 
This type of result should be familiar from Cobb-Douglas specifications, which feature constant 
expenditure shares.?? This result requires that a choice b j represents an interior maximum, so 
that the marginal benefits and costs of breadth are equated. This is guaranteed as long as f e€ 
(Assumption 1), as shown in Appendix A. 

Result (111) is a key property of the equilibrium from which other results follow. As a first 
example, recall that UR&P (o, 7) — UV?8* (7) in equilibrium, so that innovators’ income is in- 
dependent of the particular equilibrium choice o. It then follows directly from result (iii) that 
E (o, 1) must likewise be independent of the particular equilibrium choice œ. This result is en- 
capsulated as part of the following corollary. 


Corollary 1. Total education: 


(i) (Cross-section) E(@, t) = E(@', t) for any two equilibrium choices œ, œ made by indi- 
viduals in the same cohort t. 
(ii) (Time series) E(@,t) grows across cohorts at rate gz = g. 


These results inform the two key empirical facts regarding educational attainment from 
Section 2. Result (1) says that innovators in the same cohort choose the same amount of education 
across different areas of application. What is particularly surprising is that this result holds even 
though some areas may feature a greater difficulty in reaching the frontier of knowledge (higher 
D j) and some areas may be “hotter” than others, featuring more innovative opportunities (higher 
A j). This uniformity of education is possible through the endogenous allocation of innovators 
to different careers. Innovators allocate themselves across application areas to neutralize income 
differences (and hence educational differences) using differences in the degree of congestion to 
offset variation in technological opportunities or educational burden. 


23. The technical basis for this type of result lies in isoelasticity. In particular, innovator output is isoelastic in 
breadth (just as output is isoleastic to the inputs in a Cobb-Douglas specification). Isoelasticity drives the constant ex- 
penditure share. 
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Result (11) follows directly along the balanced growth path, where income is growing at rate 
g and hence education is too, maintaining a constant ratio as dictated by Proposition 1, part (iii). 
The model thus can match two key empirical facts of Section 2: common educational attainment 
across widely different areas of application, yet growing education over time. 

Denote the common educational attainment E (o, 7) within a cohort t as E* (7). This equiv- 
alence of education in turn has direct implications for the breadth of expertise. Rearranging equa- 
tion (6), we see that: 

b; = E(o, x) * / D;(x). (13) 


Common E(@, c) = E*(r) within a cohort then implies that the equilibrium breadth of expertise 
will differ only by area of application j and cohort t. In particular, denoting the equilibrium 
choice of breadth as b (7) and the growth rate in b (z) across successive cohorts as Spr, We find 
the following results. 


Corollary 2. Breadth of expertise: 


(1) (Cross-section) b (z)/b> (z) E Dy/D;. 
(i) (Time series) gp; — (1/& — 9) g, so that gb; < O iff ó> 1/=. 


The first result says that innovators in areas with deeper knowledge choose narrower exper- 
tise. This follows naturally from common E*(t)—where the depth of knowledge is higher and 
the breadth of expertise falls (see equation (13)). Interestingly, although field-specific technolog- 
ical opportunities influence the marginal benefit of breadth (see the A j term in equation (12)), 
the endogenous labour allocation across fields neutralizes this effect, so that the relative breadth 
of expertise across fields is independent in equilibrium of how valuable knowledge is, and is 
determined solely from the cost side. 

The second result tells how the breadth of expertise evolves along the growth path. From 
equation (13), the evolution of specialization across cohorts is a race between growing educa- 
tional attainment, E“(z), and a growing distance to the knowledge frontier, D;(r). Only when 
the distance to the frontier is growing at a sufficient rate (high enough 0) will workers become 
more specialized even as they invest more in education. 

The model thus can also match two further empirical facts of Section 2 regarding specializa- 
tion: greater specialization in areas with deeper underlying knowledge and increasing specializa- 
tion over time. Moreover, increasing specialization—despite increasing educational attainment— 
is directly associated in the model with increasing depth of knowledge along the growth path. 

Finally, a simple, related outcome regards teamwork. In the model, innovation requires ex- 
pertise over the whole set of knowledge underlying a given area of application.” Hence, team- 
work is required when an individual innovator does not cover the entire circle of knowledge. With 


an equilibrium decision b (z), the team size in a given cohort and application area is:?? 


1 bur) A 


L/be(r)] Acte < 1, (14) 


team ;(t) = | 


where [x] is the ceiling function; that is, [x] is the least integer > x. The following corollary 
thus follows directly from the last. 


24. See Jones (2005b) for a model where implementation of ideas need not cover the entire set of knowledge in 
a given area of application. That model details a more general set of conditions under which greater teamwork follows 
from increased specialization. 

25. Recall from Section 3.4.2 that teams are formed (i) within the same cohort when possible and (11) with the 
minimum number of necessary teammates. 
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Corollary 3. Teamwork 


(i) (Cross-section) team; (t) > team; (t) iff Di > Dy. 
(ii) (Time series) team; (x) > team; (7^) for any t > z' iff ó » 1/e. 


The model therefore identifies greater teamwork in cross-section with deeper areas of knowl- 
edge and identifies increased teamwork over time with a rising burden of knowledge. Collec- 
tively, Corollaries 1, 2, and 3 show how innovator behaviour varies across fields and evolves as 
the economy grows, providing a unified and consistent interpretation for the six facts presented 
in Section 2. 

A key mechanism in pinning down innovator behaviour is their choice of application area. 
This endogenous choice allows the equalization of lifetime income, which in turn allows the 
model to pin down educational attainment and other behaviours as shown previously. It is in- 
structive to show explicitly the resulting allocation of labour across application areas. 


Corollary 4. Along a balanced growth path, the ratio of labour allocations in different 
application areas in the R&D sector is a constant where: 


S A 1/0 
Lu 14 CDN 
Ss =|= = . (15) 
We see that Innovators are attracted to hot application areas (high A j) and areas with low 
learning costs (low D;). The interesting consequence is that while choice of application area 
is influenced by these cross-field variations, educational attainment does not vary across areas 
(Corollary 1). Meanwhile, breadth of expertise does vary with knowledge depth (Corollary 2). 


The endogenous labour allocation thus helps neutralize educational attainment but not special- 
ization, allowing the model to unify the facts of Section 2. 











3.6.1. Steady-state growth. We now consider the implications of the knowledge burden 
mechanism for aggregate growth. Growth comes from the summation of contributions from all 
innovators alive at a given moment. If there are L7,(t) innovators active in equilibrium at time t 
and these innovators raise productivity in the economy on average at rate 0 (1), then productivity 
increases per unit of time are simply X(t) = doit: (t). The growth rate of productivity is then: 


A(t) L(t) 
zeen (16) 


Calculating innovators’ average contributions, 6(t), appears complicated because innovators 
are active in different areas of application with unique innovative opportunities and knowledge 
depth, and innovators come from different cohorts. However, aggregating innovators’ contribu- 
tions is simplified by the following result. In equilibrium, innovators in the same cohort add to 
productivity at the same rate, regardless of their area of application. The intuition builds from the 
results above: once individuals in the same cohort have equivalent UR©?(@, r) and equivalent 
E(@, 7) in equilibrium, their expected gross income (UR®?(@, r) + E(@, t)) from innovation, 
and hence, their productivity contributions must also be equivalent. This property, which is shown 
formally in the proof of the following proposition, allows g to be determined as a simple function 
of exogenous parameters. 
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Proposition 2. Along a balanced growth path, 


l-o 


U E N 17 


8 


where y — P (ó — 1) < 1. There is a unique balanced growth path in equilibrium, with the constant 
growth rate g given in equation (17) and a set of labour allocations (L1(t),..., L'5 (t), Ly (t)) 
where each labour allocation grows at rate gr. 


The expression (17) defines the growth rate as the outcome of several important forces, 
marrying the knowledge burden mechanism with several ideas in the existing growth literature. 
First, the parameter y, as discussed previously, represents standard ideas in the growth literature, 
whereby the productivity of innovators may increase as they gain access to new technologies and 
new ideas (y > O0) or decrease if innovators are fishing out ideas (y < 0). The larger y—the 
greater the value of knowledge to further innovation—the greater the growth rate, as is seen in 
equation (17). 

Second, the term p (ô — 1) captures the implications of the burden of knowledge. The term 
(ô— 1) is recognized from Corollary 2. With ô > 1/é, innovators choose increasing specialization 
as the economy evolves, and we witness the “death of the Renaissance Man” along the growth 
path. The impact of narrowing expertise on growth will be large or small depending on the value 
of B, which defines the sensitivity of innovators’ productivity to their breadth of expertise.”® 

Expression (17) also shows that the model eliminates scale effects. The productivity growth 
rate is constant despite an exponentially increasing scale of research effort, with the number of 
researchers growing at the population growth rate, gz. In the model, growing population provides 
both the motive—increasing market size—and the means—more minds—for innovative effort to 
grow at an exponential rate in equilibrium, even if innovation is getting harder per person. 

From a growth point of view, the burden of knowledge parameters ñ (ó — 1) are seen to act 
similarly in equation (17 ) to the parameter y that captures any fishing-out effect. Two interpre- 
tations of the burden of knowledge mechanism are then possible. First, the burden of knowledge 
mechanism can be seen as a micro-foundation for fishing-out-type effects on growth without lit- 
erally believing that ideas are being fished out. Alternatively, if one is convinced that a fishing-out 
process operates independently, then the burden of knowledge can be seen as an additional effect 
constraining the growth rate. Articulated views of why innovation may be getting harder in the 
growth literature (Kortum, 1997; Segerstrom, 1998) and the innovation literature (e.g. Evenson, 
1991; Henderson and Cockburn, 1996) have focused on a fishing-out idea. This paper offers the 
burden of knowledge as a mechanism that makes innovation harder and acts similarly on the 
growth rate, thus explaining aggregate data trends in addition to the micro-facts presented in this 


paper. 


4. DISCUSSION 


This paper is built on two observations. First, innovators are not born at the frontier of knowledge 
but must initially undertake significant education. Second, the distance to the frontier may vary 
across fields and over time. Motivated by these observations, I present six novel facts about 
innovation in cross-section and time series and a model that ties these facts together. 


26. In a model with a time cost for education, an increasing burden of knowledge is also felt through increased 
educational time, as this reduces the portion of the life cycle left over to actively pursue innovations. Jones (20055) 
considers this more general model. 
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The burden of knowledge mechanism can further inform several related facts in existing 
literature. First, consider the age at first invention. Age at first invention is an outcome-based 
measure intended to delineate the pre-innovation and innovation phases in the life cycle. Alter- 
natively, one might consult an institutionally based measure, such as the age at highest degree. 
Existing evidence based on doctoral age also suggests an aging phenomenon. Doctoral age rose 
generally across all major fields from 1967 to 1986, with the increase explained by longer peri- 
ods in the doctoral programme (National Research Council, 1990). The duration of doctorates as 
well as the frequency of post-doctorates has been rising across the life sciences since the 1960’s 
(Tilghman, 1998). Nobel Prize winners also show a substantially increasing age at doctorate 
(Jones, 2005a), as shown in Table 1. 

The rise in teamwork also generalizes outside patenting institutions, with similarly broad 
trends reported in academic research. Increasing co-authorship in journal articles is found in 
virtually all fields of science, engineering, and the social sciences since the 1950's (Wuchty, Jones 
and Uzzi, 2007). Studies of narrower samples of research fields (e.g. Zuckerman and Merton, 
1973) suggest that co-authorship has increased steadily since the early 20th century. 

The model also provides an explicit analysis of growth, allowing it to inform aggregate 
facts. In particular, an increasing burden of knowledge can explain why rapid growth in the 
number of R&D workers and dollars in the 20th century is not associated with increased TFP 
growth rates or patenting rates (Machlup, 1962; Evenson, 1984; Kortum, 1993; Jones, 1995b). 
The model thus provides a novel solution for the absence of “scale effects”, a much debated 
subject in economic growth. At the same time, the model’s analysis of growth is inclusive, in- 
corporating existing mechanisms in the literature regarding innovation exhaustion (fishing-out 
stories), increasing innovation potential, and market size effects. We see explicitly that the bur- 
den of knowledge parameters enter the steady-state growth rate equation much as the parameter 
capturing any fishing-out effect. Therefore, from a growth perspective, one may view the burden 
of knowledge mechanism as a micro-foundation for fishing-out-type effects, or, if one imagines a 
fishing-out process that operates independently, then we can conceive of the burden of knowledge 
as an additional effect constraining the growth rate. 

The model operates with several simplifications to focus on the central mechanisms, but 
several generalizations are possible. For example, we focus on educational outlays rather than 
educational duration per se; however, educational duration can be incorporated explicitly in a 
more complex model, and the predictions for innovator behaviour remain the same.?/ The model 
also places the burden of knowledge mechanism in the rate of idea production and assumes 
that the size of ideas is fixed. More generally, one may imagine that the burden of knowledge 
is felt on the size of ideas rather than their rate, or on both dimensions. This generalization is 
straightforward (Jones, 2005b) with no effect on the main propositions and corollaries. 

In all, the micro-evidence presented in this paper, together with other available micro- 
evidence and the aggregate data trends cited previously, suggest general and multi-dimensional 
patterns that may collectively be understood from the knowledge burden perspective. While any 
individual piece of evidence can be explained by other means, the burden of knowledge knits 
together a range of evidence within a single framework. Motivated by the burden of knowledge 
concept, we are led to a set of striking facts, suggesting large changes in the organization of 
innovative activity and providing a novel explanation for the absence of scale effects in growth. 
Moreover, the micro-evidence suggests that the burden of knowledge is increasing. Note that, 
in general, a combination of increasing specialization and increasing educational attainment is 


27. Including time costs of education not only produces the same micro-econometric predictions but also introduces 
a second dimension through which the burden of knowledge influences growth. As equilibrium educational duration 
increases along the growth path, the portion of an innovator’s life cycle devoted to innovation declines, further restricting 
the growth rate. This is shown formally in Jones (20055). 
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difficult to reconcile without appealing to a greater knowledge burden. If the distance to the fron- 
tier were not increasing, then increasing education should be associated with broader individual 
knowledge, not narrowing expertise. 

If a rising burden of knowledge is an inevitable by-product of technological progress, then 
ever increasing effort may be needed to sustain long-run growth. However, two kinds of escapes 
are worth noting. First, if technological opportunities rise sufficiently rapidly, then the output of 
innovators may become sufficient, despite a rising educational burden, to sustain growth without 
increasing effort. While the 20th century's aggregate data patterns—trapidly increasing R&D ef- 
fort but flat TFP growth—do not suggest a sufficient rise in technological opportunity, there is 
nothing to say that sufficiently rapid avenues of opportunity may not open in the future. 

Second, even if the stock of knowledge accumulates over long periods, some future revo- 
lution in science may simplify the knowledge space, causing a fall in the burden of knowledge. 
Scientific revolutions—Kuhnian “paradigm” shifts (Kuhn, 1962)—might therefore have signif- 
icant benefits by easing the inter-generational transmission of knowledge. Related to this point, 
the efficiency of education—the rate at which we transfer knowledge from one generation to the 
next—becomes a policy parameter with first-order implications for the organization of innova- 
tive activity and for growth. Future improvements in knowledge transfer rates could potentially 
overcome growth in the knowledge stock. While this transfer rate probably faces physiological 
limits, policy choices in education take on further importance, as policy features from teacher 
pay to curricular design and the need for a “liberal arts” education all impact the rate at which 
human capital can be transferred to the young. 


APPENDIX A 


Proof of Lemma |. _ I first show that along a balanced growth path, Ly(t) > 0 Vt and Lj (t,5;) > OV Jj, sj, t. I 
then show that this implies UR®&P (@, 7) = UW48°(r) for any equilibrium choice æ and cohort z. 


1. Ly(t) > 0 Vt. By contradiction, let Ly (t) = 0 for some t. On a balanced growth path, where labour allocations 
grow at rate gr,, Ly (t) — 0 for some t implies Ly (t) — 0 for all t. Furthermore, all workers must then be innova- 
tors. But these innovators would earn zero income since there is no market for any innovation, with the value of 
innovations V (t) — 0 for all t if Ly (t) — 0 for all t. Therefore, innovators would strictly prefer to be wage work- 
ers, earning strictly positive income as defined by equation (4) with w(t) — X (t —z) » 0. Hence, by contradiction, 
Ly (t) > 0 Vt. 

2. L; (t, 8j) 0v. Sj, t. By contradiction, let Lif, 8j) — 0 for some j, Sj, and t, which implies that L; (t, 8j) = 0 
for all t along a balanced growth path. This cannot hold because being a scarce innovator is too tempting. Recalling 
that Ly (t) » O in any equilibrium, so there is a market for innovations, from the objective function (12), the 
choice j,5;,b; = 1 would produce unbounded income. This follows because the choice 5; — 1 makes ideas 
implementable and yet L ; (¢, s;) =0, which makes the rate of idea production unbounded. Hence, a wage worker, 
who earns bounded income by equation (11) and Assumption 3, would prefer to be such a scarce innovator. Hence, 
by contradiction, Lj Larl > 0 V j, Sj,t. 

3. Along a balanced growth path, labour allocations grow at rate gz. Hence, Ly (t) » 0 implies that individuals in 
every cohort choose to become wage workers. Meanwhile, L ; (t,s;) > 0 implies that individuals in every cohort 
choose to be innovators. By the equilibrium conditions of Section 3.5, each choice is weakly preferred to any 
other and therefore: 


URSD (o, T) = Une (a). 


for any equilibrium choice œ and cohort t. || 


Proof of Proposition |, part (i). | As shown in the proof of Lemma 1, L ;(¢,5;) > OYJ, sj, t. Morever, b; > 0 since 
otherwise individuals would earn zero income and could do better by choosing to be a production worker, who earns 
strictly positive income as shown in the proof of Lemma 1. Since individuals exist at every point s; and b; > 0 in any 
equilibrium, expertise exists at every point on the circle and all ideas are implementable. Thus, /;(t)=1Vj,t. || 
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Proof of Proposition 1, part (i). | By contradiction, imagine that: 
Lj (t, s") > Lj Gel, 


for two points s’ and s” and some time t. Along a balanced growth path, L j (£, s) must grow at the population growth 
rate for any s, which implies that the ratio L ; (t, s")/L j (t, s") is constant, and therefore, L ;(t,s') > L;(t,s”) for all t 
if Lj (t, s") > Lj (t, s") for some t. But then anyone located at s” would be strictly better off if they had chosen s". In 
particular, the objective function (12) implies that UR&P (j, s',b, t) « UR&D(;. s". p. c) for a person born at time r. 
(Given your choice of b, you would always prefer to be at the less crowded location to avoid congestion.) Thus, s^ would 
never be chosen in equilibrium. Therefore, there can be no points on the circle where the mass is less than any other point 
along a balanced growth path. || 


Proof of Proposition 1, part (iii). Differentiating the innovator’s objective function, (12), with respect to b;, 
produces the first-order condition: 


p d engt —ġ (t-r _ é 
r J| xay yo, H Vue ZU E ; (18) 


where we have used Proposition 1, parts (1) and (ii), to simplify the objective function; namely, J; (t) = 1 V j,t and 
L j (t, s) is invariant with s in equilibrium and written L (t). 

This first-order condition is directly rewritten as £ f = v (o, tye? C—Vdt=cE (o, 1). Noting from equation (5) that 
J o (e, t)e 9 0— qt 2 UR&D(o, c) 4 E(o, c), the first-order condition is equivalently: 


E(o, :)/ UFSP (o, v) — Gite — fj), (19) 


which is a constant. 
The second-order condition takes the form 62 UR&D jop* — B(B — 1) (UR&D (o c) E E(o, QE — e(e — 1) 


E(o, 1)/ bi. The first-order condition thus defines a maximum under Assumption 1: 
Hx 


which guarantees o? yR&b / ob^ « 0 where the first-order condition (19) holds. 
Note that we may also write the optimal choice b; as follows. Using equation (19), the definitions of E(@, 7) 
and Dj;(r) in equations (6) and (7), and the property that U R&D (0), c) — U 88€ (z) in equilibrium, where U Wa8e (7) — 


X (r)e" 8*/(ó — g), we have: u 


where we write the optimal choice of b; as b%(r) to clarify that in equilibrium, breadth choices depend on the application 
area j and cohort of birth c. Note that 6 < ¢ follows from Assumption 1 and ¢ > g by Assumption 3, so that b (t) is 
strictly positive. It is also unique given area of application j and cohort v. || 





Proof of Corollary 1, part (1) (total education, cross-section). By Lemma 1, income arbitrage implies U R&D 
(o, x) = URSD (œ', t) for any equilibrium choices c, c within a given cohort v. Since E(o,z)/ UR&D(Q +) is a 
constant (Proposition 1, part (iii)), E(@, t) = E(@’, r) for any equilibrium choices o, œ within a cohort t. || 


Proof of Corollary 1, part (ii) (total education, time series). By Proposition 1, part (iii), E(@, 7)/U R&D (@, T) is 
constant. Hence, E(@, 7) grows at the same rate as U R&D (@, T) across successive cohorts. From equation (11), gy = g. 
Therefore, gg =g. || 


Proof of Corollary 2, part (i) (breadth of expertise, cross-section). From equation (6), b; = E(@, eil? SD j (0. 
By Corollary 1, part (1), all innovators in cohort c have identical E(c, v) — E* (c) regardless of their equilibrium choice 
of o. Hence, equilibrium choices of b; differ only by cohort t and area of application j. Denoting equilibrium breadth 
of expertise as b (t), we see that b UR, (z) = D (1)/D; (x). From equation (7), Dj (t) = D; x (0°, and hence, 


bb) yb; | 


28. This result may also be demonstrated directly using the expression for b (7) in equation (20) previously. 
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Proof of Corollary 2, part (11) (breadth of expertise, time series). Taking logs and differentiating with respect to 
t, it follows from equation (13) that g,» = lg E—8D;- From Corollary 1, part (1i), gg = g, and from equation (7), 
j 


gp = ôg. Therefore, gp» = (1 — ô)g, so that gp» < 0 iff ô > 1/e79 | 
j j 


Proof of Corollary 3, part (1) (teamwork, cross-section). From equation (14), team ; (7) = [1/ b (7) ] if b (t)< 1 
and team ;(t) = | otherwise. Hence, team ;(z) > team ji (t) iff b (z) < b, (t). From Corollary 2, part (i), b (z) x b, (z) 
1ff D; > Dy. Therefore, team; (7) 7 team; (z) 1ff D; > Dy. | 


Proof of Corollary 3, part (ii) (teamwork, time series). From equation (14), team;(z) = [1/ b; (7)] 1f b (z) <] 
and team; (7) — 1 otherwise. Hence, team;(7) 7 team; (c^) iff b (z) € b” (c’). From Corollary 2, part (ii), b (z) is 
falling across cohorts iff ô > 1/e. Hence, team; (t) > team; (7^) for any t > t’ iff ô> 1/e. || 


Proof of Corollary 4 (labour allocation). Consider the labour allocation across application areas within a given 
cohort of researchers. Recalling from Proposition 1, part (iii), that E (@,t)/U R&D (0, 1) — A we can use the defi- 


nition of UR&P (c x) in equation (5) to write UR&P(@, t) = = i v(o, t)e 9 qr. Rewriting v(o, f) using the 
definitions in equations (10) and (9) and the equilibrium properties of Proposition 1 we have: 

OO 

URSD (Éw, t) = E A Xto (GA VCL} a)? feios a P=") ae, (21) 
€ 

T 
along a balanced growth path, so that U R&D (o), T) for a given cohort varies across applications areas only due to dif- 
ferences in L (t), Aj, and b (t). Given that b (7)/ bi (z) = D j'/ D j (Corollary 2, part (i)), the equilibrium condition 
URSD (w, t) = UR®&P(@’, 7) is therefore satisfied by the labour allocation: 

l/o 


Loi [aj (bp | 
Lei | Á; D; 











d 


Proof of Proposition 2 (steady-state growth). To show equation (17), we proceed in three steps. First, it is shown 
that the rate at which an innovator adds to productivity is the same in equilibrium for all innovators in the same cohort. 
Second, it is shown that the rate at which innovators add to productivity grows across cohorts at a constant rate. Third, 
the steady-state growth rate of productivity is determined as the summation of contributions of active innovators. 


1. Given common UR&P (@, 7) and common £E (Ææ, qt) for innovators in the same cohort z, it follows from the defi- 
nition of UR&D (@, T) in equation (5) that: 


CO CO 
ve. ne *€-Dai —. ail Aer (22) 
T T 


Note further that » (c, t) grows with time t at a constant rate independent of c). Therefore, equation (22) implies 
v(@, t) = v(@’, t) for individuals in the same cohort t at any point in time t. Writing v(@, t) = A(@, t) V(t), where 
we recall that A(@, ft) is the rate of idea production, this in turn implies 1(@, t) = A(@’,t) for individuals in the 
same cohort c at any point in time f. 

Given this result, define the common rate at which innovators in the same cohort add to productivity as 0* (r, t), 
where 0* (7,t) — y A(0,t) for any equilibrium choice @ made at time of birth t. Noting the definition of A(c, t) 
in equation (9), in equilibrium we have: 


8* (c.f) 2 y Á;X (OX L5 (0) * b5 Y". (23) 


2. Inspecting equation (23), the productivity contributions of different cohorts differ only due to their breadth of 
expertise, b (t). Further, given the results of Corollary 2, part (ii), b (t) grows across cohorts at rate (1 — ô) g, 


and hence, 


( -ô) g(r —t). 


O” D =l i ne^ (24) 


20. See prior footnote. 
30. In particular, v (œ, t) = Â EH Lei" Do V(t) is a collection of terms that are either constant given @ 
(A j: bj) or growing at common rates independent of c on a balanced growth path (X (1), L ; (t, s), V (1)). 
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3. We can now aggregate the contributions of innovators alive at time 7. This is the productivity of each innovator 
cohort, weighted by the size of that cohort, summed over all living cohorts. Defining lp (t,t) as the mass of 
innovators who were born at time t who remain alive at time t, we can write the average rate at which all living 
innovators add to productivity as: 


t 


——— J 6" (t, t)l p(t, t)dt. 


Noting that Lh (t) grows at rate gz, and that innovators die at rate ¢, it is clear that 175, (t, f) — (gy, - 9) LS (t) and 
IS (0,1) — Is (t, t)e 8L +@)G(z —t) 31 Using these facts and equation (24), we have: 


t 
d _ 
O=) | eto (2-9) 86-9 ei 6-04, (25) 
— OO 


where the integral is a finite constant if p E — ò) g+e,+¢ > 0, which is guaranteed by Assumption 3, as 
discussed below. 

We are interested in the balanced growth path, where g is a positive constant. Recall from equation (16) that 
e ss (t)L2(t)/X (t). Taking logs in equation (16), differentiating with respect to time, and asserting that g is a 
constant, we see that the steady-state growth rate is: 


8 = 80() T 8 Ls (0): 
Noting from equation (25) that ga (r) ^ 86* (t1) and from equation (23) that g9«(;.;) — x8 —o 8r -- gps. we find 
j 


thatg = yg+(l1—o)g, +f (1 — ò) g. Rearranging produces the unique steady-state growth rate, the expression 
(17), repeated here: 
l-—o 
roro Tu ed 
1-x +80- 1) 
where y — p (ô — 1) < 1(Assumption 2). || 


Existence and uniqueness of balanced growth path. We now confirm that with the steady-state growth rate g defined in 
equation (17), there exists a set of labour allocations (L1 GG), ¿e (0), L(1)] each growing at rate gr, that satisfies 
the equilibrium conditions in Section 3.5; namely, that no individual would strictly prefer a different career choice. 
Moreover, we demonstrate that the set of labour allocations (L1 (1),..., L7 (c), Ly (1)). like the steady-growth rate 
above, is unique.?? 

We proceed in three steps. First, I confirm that under Assumption 3, lifetime income from different career choices 
is finite—that is, the integrals in equations (4) and (5) exist. Second, I demonstrate the existence of a unique set of labour 
allocations (L1 (1), ..., L5 (1), Ly (7)) on a balanced growth path such that in a particular cohort z no individual would 
strictly prefer a different career choice. Finally, I confirm that the equilibrium conditions are satisfied in all cohorts along 
the balanced growth path where the steady-state growth rate g is as given in equation (17). 


1. The analysis has assumed that lifetime income, UR&P (o, r) and U V?8* (7), is finite. Having defined g explicitly 
along the balanced growth path, we can now state these assumptions as explicit parametric conditions. First, finite 
lifetime income for a wage worker requires ó > g (see equation (11)). Second, finite income for an innovator 
requires ó > yg + (1 — o)gr, (see equation (21)). With g given by equation (17), these conditions are satisfied 
by Assumption 3. (Related, we assumed previously that ó > f(ó — 1) g — gr so that the average productivity of 
innovators is finite (see equation (25)). It is easy to show that this condition is also satisfied by Assumption 3. 

2. | now show that, given a steady-state growth rate g, there exists a unique set of labour allocations (L1G), GER 
Ly G). Ly (z)) for which individuals born at time + do not strictly prefer another career choice. Note first that 
Corollary 4 established labour ratios L (z)/ L^ ,(1) such that UR&D (o, c) 2 UR&D (oy. £) for any two equilib- 


rium choices @, œ’ in cohort z. Hence, innovators in the same cohort do not strictly prefer to be another type 
of innovator. This result depends on the labour ratios L (z)/ L” , (1) and not on the overall scale of research, 


31. The latter expression follows from two observations: (1) With death rate $, l5 (v, t) — Is (r, te lt —T) and 
(2) With growth rate g7, 1% (t,t) =1%(c, t)e8L 0-7), Hence, 1% (z, 1) =14 (1, t)e 81196 70, 

32. That is, the measures of workers are unique; the equilibrium does not uniquely assign particular individuals to 
particular careers. 
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L(t). We can then set Lh (t), and hence, Lý (t) = L(t) — Lx (z), at some unique value to ensure that neither 
wage workers nor TON strictly prefer a | different career. This is clear because innovator income strictly, 
continuously decreases over the entire positive real line as L7 R(T) increases while wage worker income is a finite 
constant.?? This unique L* R(T) pins down L (1) for all j in. also pins down L} (t) = L(t) — L, (t). Hence, there 
exists a unique set of e ruin labour allocations (L7 (1),..., L5 (1), Ly reit for which no diu born at 
time z would prefer a different career choice. 

3. We have established the existence of a unique set of labour allocations such that UR®P(@, ;) = U Wa8e(;) for 
any equilibrium choice o in a particular cohort t. From equation (11), U“8°(z) grows across cohorts at steady- 
state rate g. Hence, equilibrium will be satisfied in all cohorts if U R&D (œ, t) also grows at rate g across cohorts. 
Taking logs of equation (21), differentiating with respect to z, and setting equal to g gives g = yg + B —d)gt+ 
g7 —og_. Rearranging this expression reproduces equation (17). Hence, the balanced growth path satisfies the 
conditions for equilibrium. The steady-state growth rate is as given in equation (17), with the labour allocations 
{Li (t),-.-,L7(), Ly} each growing at rate gr. 2 


APPENDIX B. DATA APPENDIX 


The reader is referred to Hall et al. (2001) for a detailed discussion of their patent data set. This appendix focuses 
on the age information collected to augment the data of Hall et al. 

Age data were collected using the Web site http://www.AnyBirthday.com, which requires a name and zip code 
to produce a match. As shown in Table B.1, 30% of U.S. inventors listed a zip code on at least one of their patent 
applications, and of these inventors, http://www.AnyBirthday.com produced a birth date in 25% of the cases. While the 
number of observations produced by http://www.AnyBirthday.com is large, it represents only 7-5% of U.S. inventors. 
This appendix explores the causes and implications of this selection. The first question is why zip code information is 
available for only certain inventors. The second question is why http://www.AnyBirthday.com produces a match only 
one-quarter of the time. The third question is whether this selection appears to matter. 

Table B.2 compares how patent rights are assigned across samples. The table shows clearly that zip code information 
is virtually always supplied when the inventor has yet to assign the rights; conversely, zip code information is never 
provided when the rights are already assigned. Patent rights are usually assigned to private corporations (80% of the 
time) and remain unassigned in the majority of the other cases (17% of the time). An unassigned patent indicates only 
that the inventor(s) have not yet assigned the patent at the time it is granted. Presumably, innovators who provide zip codes 
are operating outside binding contracts with corporations, universities, or other agencies that would automatically acquire 
any patent rights. The zip code subset is therefore not a random sample but is capturing a distinct subset of innovators 
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33. This can be seen formally as follows. Use equation (15) to write Li (t) = E E ) | Lu) Noting 
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or L* HO = cLp(t) where c = > Ë 5 
k 


equation (21), we see that URSD (o, T) is continuous and monotonically decreasing in Lp (t). This follows be- 
cause increasing L7 nG) (a) increases crowding (L (z) ? falls) and (b) reduces the Met size for innovations 


(Ly (t) falls and SC V(t) falls), both of which Educ URSD (æ, t). Moreover, lim; + (1) 0+ UR&D (o, 7) — oo and 


lim; « (p (py- U* &D (o), c) — 0. Hence, we can pick some L} (t) € [0, L(z)] such that UR&D @, 7) is any positive 


value. Meanwhile, U*8°(r) is a finite, strictly positive constant. Hence, there exists some unique scale of research 
activity, L^, (r), such that U R&D (9), 7) — U "38€ (rz) in the cohort born at time c. 

34. Note that the growth rate, g, is determined two alternative ways. The first approach considers the steady-state 
growth rate as the summation of contributions of innovators. The second approach considers the steady-state growth rate 
that guarantees the equilibrium condition UR&P (o, r) — UW38€(z) will hold across cohorts. It is instructive to clarify 
why these two methods produce the same result. First, innovators' total EE contributions follow the evolution 
of 0(t), their average rate of productivity contributions per innovator, and 17, (t), the scale of research effort. Second, 
innovator income across cohorts follows the evolution of 0 (t, t) across cohorts nd L y (t), the scale of the market. On the 
balanced growth path, L,(r) and L% (t) grow at the same rate. Meanwhile, as lon previously, the evolution of 8 (t) 
with time and the evolution of dir, f) from one cohort to the next are also equivalent. Hence, calculating growth from 
either perspective produces the same growth rate along a balanced growth path. 
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TABLE B.1 


Number of observations at each stage of selection 


Number ol ` Percentage of Percentage of Percentage of 


observations row (3) row (4) row above 
(1) Patents granted 2,139,313 
(2) Inventors worldwide 4,301,229 
(3) Unique inventors worldwide 1,411,842 
(4) Unique inventors with U.S. address 752,165 39:9 33:3 
(5) Unique inventors with U.S. address, zip zode 224,152 15.9 29.8 29.8 
(6) Unique inventors with U.S. address, zip code, 56,28 1 4-0 T-5 25.1 


unique match from http://www.AnyBirthday.com 


Notes: (i) Observation counts consider the 1975—1999 period. (11) A “unique inventor” is defined by having same first 
name, last name, and middle initial. 


TABLE B.2 
The assignment of patent rights 


Birth data 
All U.S. U.S. patents U.S. patents Direct Other 
Assignment status patents patents no zip code Zip code match patents 
Unassigned (%) 17-2 22-4 0-4 98-3 97-9 26-6 
U.S. non-government 43.0 72.9 94.1 0-0 0-0 65:7 
organization (%) 
Non-U.S. non-government 36-2 1-1 1-4 0-0 0-0 3-4 
organization (%) 


Other assignment (%) 2-7 3-5 4-] ]-7 2.1 4.4 


Notes: (1) The first column considers all patent observations in the 1975—1999 period (2-1 million observations). 
(11) U.S. patents are those for which first inventor listed with the patent has a U.S. address. (i) The birth data 
columns consider those U.S. patents with zip code information for which http://www.AnyBirthday.com produced 
a birth date. The first birth data column considers the specific patents on which http://www.AnyBirthday.com was 
able to match. The last column considers all other patents by that innovator, identifying the innovator by last name, 
first name, and middle initial. (iv) Unassigned patents are those for which the patent rights were still held by the 
original inventor(s) at the time the patent was granted; these patents may or may not have been assigned after 
the grant date. (v) Non-government organizations are mainly corporations but also include universities. (vi) Other 
assignment includes assignments to: (a) U.S. individuals, (b) non-U.S. individuals, (c) the U.S. government, and 
(d) non-U.S. governments. 


who, at least at one point, were operating independently. Despite this distinction, this subset may not be substantially 
different from other innovators: the last column of Table B.2 indicates that when examining the other patents produced 
by these innovators, they have a similar propensity to assign them to corporations as the U.S. population average. 

The nature of the selection introduced by http://www.AnyBirthday.com is more difficult to identify. The Web site 
reports a database of 135 million individuals and reports to have built its database using “public records”. Access to 
public records is a contentious legal issue.?? Public disclosure of personal information is proscribed at the federal level 
by the Freedom of Information Act and Privacy Act of 1974. At the state and local level, however, rules vary. Birth date 
and address information are both available through motor vehicle departments and their electronic databases are likely 
to be the main source of http://www.AnyBirthday.com records.°° The availability of birth date information is therefore 
very likely to be related to local institutional rules regarding motor vehicle departments. Geography thus will influence 


35. Repeated requests to http://www.AnyBirthday.com to define their sources more explicitly have yet to produce 
a response. 

36. A federal law, the Driver’s Privacy Protection Act of 1994, was introduced to give individuals increased pri- 
vacy. The law requires motor vehicle departments to receive explicit prior consent from an individual before disclosing 
their personal information. However, the law makes an exception for cases where motor vehicles departments provide 
information to survey and marketing organizations. In that case, individual’s consent is assumed unless the individual has 
opted-out on their own initiative. See Gellman (1995) for an in-depth discussion of the laws and legal history surrounding 
public records. 
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TABLE B.3 


Inventors per patent, mean differences between samples 


Dependent variable: inventors per patent 


(1) (2) (3) (4) (5) 
U.S. address dummy —0-315 —0-339 —0-300 —0-124 —0-103 
(0.0020) | (0.0020) (0.0020) (0-0049) (0-0048) 
U.S. address and zip code dummy —0-786 —0-670 — 0-769 — 155 —0-176 
(0.0033) . (0.0033) (0.0032) (0.0069) (0-0066) 
U.S. address, zip code, and 0-237 0-246 0-212 0-243 0-228 
http://www.AnyBirthday.com direct match dummy 
(0.0068) . (0.0067) | (0.0067) (0.0067) (0-0066) 
Constant 2-28 2.57 1-96 ]-45 1-56 
(0.0014) — (0.0023) (0.0052) (0.0042) (0-0067) 
Technological category dummies No Yes No No Yes 
Grant year dummies No No Yes No Yes 
Assignee code dummies No No No Yes Yes 
R? 0-0555 0-0825 0-0756 0-0757 0-1162 


Notes: (i) Regressions consider means in the entire data set (2-1 million patent observations), covering the 1975—1999 
time period. S.E. are given in parentheses. (11) Dummy variables are nested: the second row captures a subset of the 
first. The third row captures a subset of the second. (iii) Innovators for whom http://www.AnyBirthday.com produces 
a birth date are often involved with multiple innovations over the 1975—1999 period. The patents used for comparison 
in this table are those patents for which http://www.AnyBirthday.com produced the direct match. (iv) Regressions with 
technological category controls are reported using the six-category measure of Hall et al. (2001). Results using the 
36-category. 


the presence of innovators in the age sample, and a further issue in selection may involve the geographic mobility of the 
innovator, among other factors. The influence of this selection, together with the implications of assignment status, can 
be assessed by comparing observable means in the population across subsamples. 

Table B.3 considers average team size, which is a source of further differences. Patents with provided zip codes have 
smaller team sizes than the U.S. average; team sizes in the subset of these patents for which the age of one innovator is 
known are slightly larger but still smaller than the U.S. average. Controlling for other patent observables, in particular 
the assignment status, reduces the mean differences and brings the age sample quite closely in line with the U.S. mean. 
(See the last two columns of the table.) Having examined a number of other observables in the data, such as citations 
received and average tree size, I find that relatively small differences tend to exist in the raw data and that these can be 
either entirely or largely explained by controlling for assignment status and team size. Most importantly, the age results 
in the text are all robust to the inclusion of assignment status, team size, and any other available controls. 

Finally, examining team size, specialization, and time lag trends in the age subsample, the results are similar in sign 
and significance as those presented in Section 4. The rate of increase in specialization is larger, and the rate of increase 
in team size is smaller. The time lag shows no trend. Reexamining trends in the entire data set by assignment status, I 
find that the team size trend is weaker among the unassigned category, which likely explains the weaker trend in the age 
subset. Similarly, I find that the specialization trend is stronger among the unassigned category, which likely explains the 
stronger trend in the age subset. 

I conclude therefore that while the age subset is not a random sample of the U.S. innovator population, the differences 
tend to be explainable with other observables and, on the basis of including such observables in the analysis, the age 
results appear robust. 
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